Add parallel-task-set crate, test it, use it #8174

smklein · 2025-05-15T22:40:49Z

Follow-up from support bundles work.

This crate exposes a "JoinSet"-like interface which also has a bound on maximum parallelism.

hawkw

looks good to me, with a couple of nits. there are some places in Nexus we could use this too!

parallel-task-set/src/lib.rs

hawkw · 2025-05-16T16:11:55Z

Test failure in build-and-test (helios) looks like a flake (in one of my tests, agh).

@smklein

Presently, several Nexus background tasks use `tokio::task::JoinSet` to run a large number of Tokio tasks in parallel. In these tasks, we typically set a concurrency limit on the number of spawned tasks using the size of a database query that's used to determine the tasks that should be spawned. We perform the query with a small page size, spawn a group of tasks, and wait for them to complete, in a loop, until the query returns no records. While this is simple to implement, it's not the ideal way to do this, as it will unnecessarily limit the throughput of the spawned tasks. This is because this pattern does not ensure that *exactly* `$CONCURRENCY_LIMIT` tasks are running at a given time, it ensures that *up to* `$CONCURRENCY_LIMIT` tasks are running. Since the database is not queried again to spawn a new batch of tasks until after the *entire* batch of tasks complete, there will always be some period of time during which only a single task is running and all the others have completed. If there's a relatively large variation in how long those tasks take to complete, one slow task can potentially prevent any others from starting for a longish period of time. An alternative approach, where the tasks are spawned all at once but made to wait on a `tokio::sync::Semaphore` before they actually begin executing, allows us to maximize throughput while limiting concurrency. In this approach, a new task will begin executing immediately as soon as another task finishes, so there are always exactly `$CONCURRENCY_LIMIT` tasks running until the final batch of tasks begins to complete. The `parallel-task-set` crate added in PR #8174 implements a reusable abstraction for this, so this branch updates the `instance_watcher` and `webhook_deliverator` background tasks to use it. Furthermore, @smklein and I spent some time tweaking the `ParallelTaskSet` API to make it easier to limit not only the number of tasks _executing_ in parallel, but also the number of tasks _resident in memory_ at any given time, by changing the `spawn` method to wait for a previous task to complete if the set is already at the limit. Note that I did *not* change the `instance_updater` background task to use `ParallelTaskSet` in this manner, as all it does is run `instance_update` sagas. Unlike the `instance_updater` and `webhook_deliverator` background tasks, which make HTTP requests to sled-agents and external webhook endpoints, respectively, this task just spawns sagas and waits for them to finish. So, its spawned tasks aren't actually doing any _work_ besides waiting on a `RunningSaga` future to complete, and the actual work is performed in the saga executor. Concurrency limiting the actual work would require the concurrency limit to be implemented in the saga executor, and not the background task. Also, it's important that all the sagas be _started_ as soon as possible, even if the current nexus does not execute them, so that they may be picked up by other Nexii. Similarly, the `instance_reincarnation` task also performs a query-spawn-batch-wait type loop, but in that case, it's necessary as the sagas started for each instance in the query performs the state change that evicts it from a subsequent query. Therefore ,that task _must_ wait for all sagas in the batch to complete before proceeding.

Add parallel-task-set crate, test it, use it

66e95b7

smklein requested a review from papertigers May 15, 2025 22:41

smklein mentioned this pull request May 15, 2025

[sled-diagnostics] use ParallelTaskSet for multiple commands #8151

Merged

smklein requested a review from hawkw May 15, 2025 23:27

hawkw approved these changes May 15, 2025

View reviewed changes

parallel-task-set/src/lib.rs Show resolved Hide resolved

parallel-task-set/src/lib.rs Outdated Show resolved Hide resolved

parallel-task-set/src/lib.rs Show resolved Hide resolved

hawkw mentioned this pull request May 16, 2025

test failed in CI: test_instance_failed_by_stop_request_does_not_reincarnate #8178

Open

feedback

b47f714

smklein merged commit b6eff04 into main May 19, 2025
18 checks passed

smklein deleted the parallel-task-set branch May 19, 2025 15:30

hawkw mentioned this pull request May 19, 2025

[nexus] use ParallelTaskSet in background tasks #8187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add parallel-task-set crate, test it, use it #8174

Add parallel-task-set crate, test it, use it #8174

Uh oh!

smklein commented May 15, 2025

Uh oh!

hawkw left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hawkw commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Add parallel-task-set crate, test it, use it #8174

Add parallel-task-set crate, test it, use it #8174

Uh oh!

Conversation

smklein commented May 15, 2025

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hawkw commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!